A Hybrid Unsupervised Density-based Approach with Mutual Information for Text Outlier Detection
نویسندگان
چکیده
The detection of outliers in text documents is a highly challenging task, primarily due to the unstructured nature and curse dimensionality. Text document refer data that deviates from found other belonging same category. Mining has wide applications various domains, including spam email identification, digital libraries, medical archives, enhancing performance web search engines, cleaning corpora used classification. To address issue dimensionality, it crucial employ feature selection techniques reduce large number features without compromising their representativeness domain. In this paper, we propose hybrid density-based approach incorporates mutual information for outlier detection. proposed utilizes normalized identify most distinct characterize target Subsequently, customize well-known local factor algorithm suit datasets. evaluate effectiveness approach, conduct experiments on synthetic real datasets comprising twelve high-dimensional results demonstrate consistently outperforms conventional methods, achieving an average improvement 5.73% terms AUC metric. These findings highlight remarkable enhancements achieved by leveraging conjunction with algorithm, particularly
منابع مشابه
A Local Density-Based Approach for Local Outlier Detection
This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Densitybased Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of...
متن کاملOutlier Detection in Dataset using Hybrid Approach
Outlier is a data point that deviates too much from the rest of dataset. Most of real-world dataset have outlier. Outlier analysis is one of the techniques in data mining whose task is to discover the data which have an exceptional behavior compare to remaining dataset. Outlier detection plays an important role in data mining field. Outlier Detection is useful in many fields like Medical, Netwo...
متن کاملRODHA: Robust Outlier Detection using Hybrid Approach
The task of outlier detection is to find the small groups of data objects that are exceptional to the inherent behavior of the rest of the data. Detection of such outliers is fundamental to a variety of database and analytic tasks such as fraud detection and customer migration. There are several approaches[10] of outlier detection employed in many study areas amongst which distance based and de...
متن کاملOutlier Detection for Text Data
The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distin...
متن کاملIntrusion Detection based on a Novel Hybrid Learning Approach
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International journal of intelligent systems and applications
سال: 2023
ISSN: ['2074-904X', '2074-9058']
DOI: https://doi.org/10.5815/ijisa.2023.05.04